BlackfootFish <- BlackfootFish |>
mutate(length = (length - min(length, na.rm = TRUE)) /
(max(length, na.rm = TRUE) - min(length)),
weight = (weight - min(weight, na.rm = TRUE)) /
(max(weight, na.rm = TRUE) - min(length, na.rm = TRUE))
)Lab 7: Functions and Fish
The goal of this lab is learn more about exploring missing data and to teach you to write modular code.
1 The Data
This lab’s dataset concerns mark-recapture data on fish from the Blackfoot River, outside of Helena, Montana.
Mark-recapture is a common method used by Ecologists to estimate an animal population’s size, when it is impossible to conduct a census (count every animal). This method works by “tagging” animals with a tracking device, so scientists can track their movement and / or presence.
You may download the BlackfootFish.csv dataset here.
2 Part One: Summaries and Plots (Midterm Review)
2.1 Summarizing Missing Data
The measurements of each fish captured were taken by a Biologist on a raft. This lack of “laboratory setting” opens the door to the possibility of measurement errors.
What variable(s) have missing values present?
How many observations within each variable have missing values?
Output both pieces of information in one table!
2.2 Visualizing Missing Data
Unfortunately, these missing values are not for only one year, trip, or section of river.
Create a thoughtful visualization exploring the frequency of missing values across the different years, sections, and trips.
This is a great opportunity to demonstrate what you’ve learned about visualizations thus far and to show off your creativity!
3 Part Two: Adjusting the Data (Function Writing)
If I wanted to rescale every quantitative variable in my dataset so that the variables have values between 0 and 1. I could use the following formula:
\[y_{scaled} = \frac{y_i - min\{y_1, y_2,..., y_n\}}{max\{y_1, y_2,..., y_n\} - min\{y_1, y_2,..., y_n\}}\]
The following R code would carry out this rescaling procedure for the length and weight columns of the data:
This process of duplicating an action multiple times makes it difficult to understand the intent of the process. Additionally, it makes it very difficult to spot the mistakes. Did you spot the mistake in the weight conversion?
Often you will find yourself in the position of needing to find a function that performs a specific task, but you do not know of a function or a library that would help you. You could spend time Googling for a solution, but in the amount of time it takes you to find something you could have already written your own function!
3.1 Writing a Function
Let’s transform the repeated process above into a rescale_01() function.
- The function should take a single vector as its input.
- The function should return the rescaled vector.
Think about the “efficiency” of your function. Are you calling the same function multiple times?
3.2 Adding Stops
Now, let’s incorporate some checks into your function! Modify your previous code to create the following checks:
- the function should stop if the input vector is not numeric
- the function should stop if the length of the vector is not greater than 1
3.3 Performing a Simple Test
First, test your function on the simple vector below. Add code that verifies the maximum of your rescaled vector is 1 and the minimum is 0!
x <- c(1:85, NA)3.4 Performing a More Difficult Test
Next, let’s test the function on the length column of the BlackfootFish dataset.
Make plots of the original values of length and the rescaled values of length. Output your plots stacked vertically, so the reader can confirm the only aspect that has changed is the scale.
By stacked vertically, I mean that the x-axis of the two plots should be stacked on top of each other!
3.5 Incorportaing Variables
Suppose you would like to make a more general rescale_column() function that perform operations on any variable within a dataset. Ideally, your function would take a dataframe and a variable name as inputs and return a dataframe where the variable has been rescaled.
Create a rescale_column() that accepts two arguments:
- a dataframe
- the name(s) of the variable(s) to be rescaled
The body of the function should call the original rescale_01() function you wrote previously.
If you are struggling with this task, I would recommend re-reading the Data frame functions section from R for Data Science.
3.6 Another Function Test
Alright, now let’s put your rescale_column() function to work! Use your rescale_column() function to rescale both the length and weight columns.
I expect that you carry out this process by calling the rescale() function one time!
I advise against using functions like mutate_at(), which have been superseded.
4 Challenge: Incorporating Multiple Inputs
A frequently used measurement for fish health is a “condition index.” (Wikepedia article) The following simple equation can be used to calculate the approximate condition index of a fish:
\[\text{condition index} = \frac{weight}{length^3} \times 100\]
4.1 Part 1
There are specific units required for the calculation of a condition index. Length must be in millimeters, and weight must be in grams. Inspect the length and weight variables to decide if you believe these are the correct units associated with these measurements—this will likely require Googling what “typical” measurements of trout are.
Replacing Impossible Measurements with NAs
Based on your research, write function(s) to handle the unlikely / impossible measurements included in the dataset. Your function(s) should accept three inputs (1) a vector of measurements, (2) the minimum value you believe is “reasonable,” and (3) the maximum value you believe is “reasonable.” If a value falls outside these bounds, you should replace it with an NA.
If you are struggling with the structure of your function, I would suggest re-reading the Mutating Function section from R for Data Science.
Use your function to modify the length and weight columns of the BlackfootFish dataset, removing values you believe are “unreasonable.”
4.2 Part 2
Write a function which calculates the condition index of a fish, given inputs of weight and length.
Consider whether your function will accept vectors as inputs or if it will accept variable names as inputs!
4.3 Part 3
Make a thoughtful visualization of how fish conditions have varied over the duration of this study.
This is a great opportunity to demonstrate what you’ve learned about visualizations thus far and to show off your creativity!
4.4 Challenge Submission
Your challenge should be submitted as a separate file, not at the bottom of your Lab 7 file. Please submit your rendered HTML file. You can copy and paste this code into a new Quarto file. Your Challenge 7 submission should only included code necessary for completing the Challenge, nothing else.
You will submit only your rendered HTML file. Your HTML file is required to have the following specifications in the YAML options (at the top of your document):
- be self-contained (
self-contained: true) - include your source code (
code-tools: true) - include all your code and output (
echo: true) - include no messages (
messages: false) or warnings (warnings: false) from loading in packages or the data
If any of the options are not included, your Lab 7 or Challenge 7 assignment will receive an “Incomplete” and you will be required to submit a revision for the formatting of your assignment.